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Abstract 

M^ Evolutionary events such as incomplete lineage sorting and lateral 

Ph gene transfer constitute major problems for inferring species trees from 

Q gene trees, as they can sometimes lead to gene trees which conflict 

with the underlying species tree. One particularly simple and efficient 
way to infer species trees from gene trees under such conditions is to 

O^ combine three-taxon analyses for several genes using a majority vote 

approach. For incomplete lineage sorting this method is known to be 

— ^ statistically consistent, however, in the case of lateral gene transfer it 

^ is known that a zone of inconsistency does exist for a specific four- 

taxon tree topology. In this paper we analyze all remaining four-taxon 
topologies and show that no other inconsistencies exist. 

lO Keywords: Phylogenetic trees, lateral gene transfer, statistical consis- 

^' tency 
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^— I 1 Introduction 

> 

A major problem in inferring species trees from gene trees is that different 
genes often suggest different evolutionary histories [1]. This phenomenon is 
Cd caused by incomplete lineage sorting and reticulate evolutionary events, i.e. 

hybridization and lateral gene transfer, and it naturally poses the question, 
whether the underlying gene tree can be consistently reconstructed from a 
set of gene trees? In the case of hybridization, it is clear that no single 
tree can adequately describe the evolution of the taxa under study, and that 
a netvi^ork is usually a more appropriate representation. For incomplete 
lineage sorting recent theoretical work based on the multi-species coalescent 
has shown that the most probable gene tree topology can differ from the 



^ 



species tree topology, when the number of taxa is greater than three . By 
contrast it has long been known that for triplets, the matching topology is 
the most probable topology [31[H]- Complementary to this, it was recently 
proved that, under the standard or extended models of lateral gene transfer 
(LGT), the matching gene tree topology is also the most probable topology 
for a tree with three taxa; but for the fork-shaped four-taxon tree topology 
there exist branch lengths for which the matching topology of a triplet has 
the lowest probability of the three possible topologies [6]. In this paper 
we start by recalling the two models of lateral gene transfer and the key 
definitions from [6]. We then give a thorough analysis of the other four- 
taxon tree topology (the pectinate topology), showing that in this case, the 
matching topology for a set of three leaves is always the most probable 
topology, regardless of the location of the fourth taxon. This completes the 
four-taxon case and implies that four-taxon species trees can be consistently 
reconstructed using a triplet-based majority vote approach, provided that 
the branch lengths meet the conditions given in [6j. 

2 Key definitions 

Throughout this paper X will denote a set of n taxa, and A will be a subset 
of size 3 of X. Let T be a rooted phylogenetic species tree with leaf set 
X and root p. Regarding T as a 1-dimensional simplicial complex so each 
point p in T is either a vertex or an element of an interval that corresponds 
to an edge, we use a coalescence time scale: t : T — )• [0, cxd) with coalescence 
time increasing into the past, such that 

• t{p) = <^ p is a leaf, and 

• if n is a descendant of v then t{u) < t{v). 

We will denote the time from the present to the most recent common 
ancestor (MRCA) of the two most closely related taxa in ^4 by tyi; e.g. if 
T\A = a\bc then tA is the time from the present to the MRCA of b and c. 

Linz et al. defines what we will refer to as the standard LGT model 
in p]. This model makes the following assumptions: 

1. A binary, labeled, rooted and clocklike species tree T is given, as well 
as all the splitting times along this tree; 

2. differences between a specific gene tree and T are only caused by LGT 
events; 

3. the transfer rate is homogeneous per gene and unit time; 

4. genes are transfered independently; 



5. one copy of the transferred gene still remains in the donor genome; 
and 

6. the transferred gene replaces any existing orthologous counterpart in 
the acceptor genome. 

Based on this model, the authors in [6] considered an extended LGT model, 
in which the rate of gene transfer between two lineages can be decreasing 
in the distance between the two lineages. Specifically, letting d{p,p') be the 
evolutionary distance between contemporaneous points p and p' in T, item 
three above is replaced by the following assumption: 

3. transfer events on T occur as a Poisson process through time, in which 
the rate of transfer events from point p on a lineage to a contempora- 
neous point p' on another lineage at time t occurs at rate f{d{p,p'),t), 
where f{d, t) is a monotone non-increasing function in d (but can vary 
non-monotonically in t). 

2.1 Lateral gene transfer events and transfer sequences 

A lateral gene transfer (LGT) on T is an arc from p £ T to p' £ T where 
t{p) = t{p') and neither p or p' are vertices of T. We write a = {p,p') to 
denote this transfer event and we write t{a) for the common value of t{p) 
and t{p'). We will assume that no two transfer events occur at exactly the 
same time. 

Let CT = o"! . . . CTfc be a sequence of transfer events arranged in increasing 
i- value: 

< t(o-i) < t(o-2) < • • • < t(o-fc) < t{p). 

Given a species tree T and a transfer sequence a = ai . . . ak on T , we 
obtain an associated gene tree T[a}. An LGT arc a from point p to p' in 
T describes the event that the gene which was present on the edge at p' 
is replaced by the transfered gene from p. Thus, if we trace the history of 
a gene from the present to the past, each time we encounter an incoming 
horizontal arc into this edge, we follow this arc (against the direction of the 
arc). Mathematically this is formalized as follows: For a transfer sequence 
CT = cji . . . cjfc where ctj = {pi,p[) consider the tree T together with a directed 
edge for each ai placed between pi and p[ for each i G {1, . . . , A:} and regard 
this network as a one-dimensional simplicial complex. Now for each i E 
{1, . . . , /c} delete the interval above p[ and consider the minimal connected 
subgraph of the resulting complex that contains X. This is T[q\. 

Given the pair T, a = ai . . .a^, define the following sequence of X-trees: 

To = T,Tr = Tr-i[ar]. 

And given T' G {Tq, Ti, . . . , T^}, a point p £ T' and a non-empty subset Y 
of X, let desY{T' ,p) denote the subset of y whose elements are descendants 
of p in T' . 



2.2 Triplet analysis 

Let A he a subset of X of size 3, let T be a phylogenetic species tree on 
X, let CT be a sequence of transfer events on T, and let ar = {pr,p'r) be a 
specific transfer event on T. We say that: 

• a induces a match for A if T\A = T[a}\A. Otherwise we say that a 
induces a mismatch for A. 

• ar is into an A-lineage if desAiT,p'j.) is a single element in A. 

• ar is an A-transfer and it transfers x if desA{Tr-i,Pr) = {x} for some 
X £ A. 

• ar is an A-moving transfer and it moves x if it transfers x and desAiTr-i, Pr) 



• ar is an A-joining transfer and it joins x to y if it transfers x and 
(iesA(rr-i,Pr) = {y} for some y £ A. 

Note that any A-transfer is either an A-moving or an A-joining transfer. 

Let a = o"i, (T2, ..., CTfc be a sequence of transfer events on T with t{ak) < 
tA and no A-joining transfers. Then construct the sequence Tq, T^, . . . , T^ of 
trees by the following procedure: Set Tq = T and construct T/_,_j^ from T/: 

• If ai is not A-moving 

• else if CTj = {pi,p'i) moves x G A, let T/ be the tree obtained from Tl_^ 
by 

1. deleting all p € Tl_-^ with t(p) < t{ai), 

2. labeling pi hy x, 

3. for both z £ A — {x}, assigning label z to the unique point pz of 
T/_;^ that has t{pz) = t{ai) and z G desA{Tl_i,Pz), and 

4. regarding all other leaves in the tree as unlabeled. 

The following two lemmas were given and proved in [6]: 

Lemma 1 Let a be a sequence of transfer events on a rooted binary X-tree 
T and let A = {a, b, c} C X . 

1. If a induces a mismatch for A, then a must contain an A-transfer 
with a t-value less that tA- 

2. Moreover precisely one of the following occurs: 

(a) a has no A-transfers. In this case, a induces a match for A. 



(b) a contains at least one A-joining transfer. In this case, if the 
first such transfer in a joins x to y, then T[q\\A = z\xy where 
{x,y,x] = A. 

(c) a has no A- joining transfers, hut it has an A-moving transfer 
with a t-value less than tA- In this case, if ar denotes the first 
such A-moving transfer in a then: 

T[a]\A = T[ar,...,ak]\A. 



Lemma 2 Suppose a_ = di, o"2, . . . ,<7fc is a sequence of transfer events on 
a rooted binary X~tree T with t{ak) < tA and with no A-joining transfers. 

Then T[a]\A = mA. 

3 Three-taxon trees 

For completeness we restate the following result for three-taxon trees from [6]: 

Proposition 1 If T has just three taxa, then under the extended LGT 
model, the probability that a transfer sequence induces a match for the three 
taxa is strictly greater than the probability it induces either one of the two 
mismatch topologies (which have equal probability). 



4 Four-taxon trees 

For four-taxon trees there are two rooted binary tree topologies - the fork- 



shaped topology with two cherries as shown in Fig. 1(a) and the pectinate 



tree topology shown in Fig. 1(b), The fork-shaped topology was studied 



thoroughly in [H], and we will study the pectinate tree topology. 

For four-taxon trees a, b, c, d, we will write {ab; c; d) to denote the pecti- 
nate tree topology depicted in Fig. 1(b) This topology is symmetric to 





(a) The fork-shaped four-taxon tree 
topology. 



(b) The pectinate four-taxon tree 
topology {ab; c; d) 



Figure 1: The two four-taxon tree topologies. 



{ba;c;d), {d;c;ab) and {d;c;ba), but no other symmetries hold. For any 
pectinate four-taxon tree we denote the time of the MRCA of the two 
most closely related taxa by t2, and the time of the MRCA of the three 
most closely related taxa by t^. Thus, for example if the tree has topology 
{ab; c; d), t2 is the time of the MRCA of a and b, and t^ is the time of the 
MRCA of a, b and c. 

The main result in this paper is the following theorem: 

Theorem 1 Suppose T is a pectinate four-taxon tree and A = {a, b, c} is 
a subset of the leaf set X of T, and suppose that T\A = ab\c. Let a = 
(Ji, (72, . . . , o"fc be a random sequence of transfer events on T generated by the 
standard LGT model of J^, in which the rate of transfer events from point 
p to a contemporaneous point p' is A. Then the probability that a induces a 
match on A is strictly higher than the probability that it induces either one 
of the two mismatch topologies (which have equal probability). 

More specifically: Let^g^ij^^, S,ac\b o,'nd^hc\a denote the disjoint events that a 
induces a tree displaying the triplet topologies ab\c, ac\b or bc\a, respectively, 
and let IP(Cab|c); ^iCadb) ^'^^ ^{^bc\a) ^6 the probabilities of these. Then for 
H = ^At2 and B = 3A(t3 — ^2) 

(i) if T is of type {ab; c; *) we have: 

nU|c) = ^(l + e-^'^(^e-^e-^'^ + (l-^e-^)e-2A' + (l-le-^))), (1) 

and F{^ab\c) > 5 > ^{Lc\b) = ^{Cbc\a) for all values of t2 and ts; 
(a) if T is of type {ab;*;c) we have: 

and F{^ab\c) > 5 > ^{Lc\b) = ^{Cbc\a) for all values of t2 and ts; and 
(Hi) ifT is of type {a*;b;c) or (b*;a;c) we have: 

1/-. _7/,/<J _R —Aj, /I 1 _Hx —9/, /I 11 -^ 

(3) 



able) = 3(1 + e-'^ile-^e-'^ -{'-- \e-^)e-'' + (^ + y e^^))), 



and P(Cafe|c) > 3 > IF'(Caclfe) = IP(?6c|a) for all values of t2 and t^. 

The proof of Theorem [T] relies on the analysis of the discrete 7-state 
Markov chain whose transition digraph is illustrated in Fig. [2] We will 
therefore study this Markov chain thoroughly, before we dive into the proof 
of the theorem. 



4.1 The 7-state continuous-time Markov chain 



Let Zf : t > he the 7-state continuous-time Markov chain defined by the 
rate matrix 
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and illustrated in Fig.[2[ let pr{t) = F{Zt = r), 
Then by standard Markov chain theory [7J 

-p(t) = p(t)Q and p(t) 



andletp(i) = \po{t), . . . ,p6{t)]. 



p(0)exp(Qt). 



The Markov chain is easily seen to be irreducible and ergodic and thus 
it has a stationary distribution n. To find tt let 
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be the transition probability matrix of Q defined by 



lij 



Pij 



Then tt can be computed as 



if i / j 



otherwise 
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where is the unique solution summing to 1 of </>(/ — P) = and Dq is the 
7x7 diagonal matrix having the same diagonal as Q [Z] • Using this we find 



TV 



12 2 2 2 2 1 
12' 12' 12' 12' 12' 12' 12 



The eigenvalues of Q are 

A4 = 



-5, 
-3, 
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Figure 2: The 7 state continuous-time Markov model. 



with corresponding eigenvectors: 



Vi = 


(-2,1,-1,0,1,-1,2) 


V2 = 


(-1,0,1,0,-1,0,1) 


V3 = 


(-2,1,0,1,-2,1,0) 


V4 = 


(2,-1,-1,2,-1,-1,2) 


V5 = 


(4,3,-1,-3,-2,0,2) 


V6 = 


(-2,-2,0,1,1,1,0) 


V7 = 


(1,1,1,1,1,1,1) 



Thus every element Pr{t) for r = 0, 1, . . . , 6 and t > is of the form: 

Pr{t) = ttr + br exp(— t) + Cr exp(— 3t) + dr exp(— 4t) + Cr exp(— 5t) 

for constants Uj, . . . , dj depending on p(0). To find these constants we will 
solve the set of linear equations given by 



-5t 



-4t ^-3t 



-It 



p(f ) = [e— , e— , e— , e— , e— , e--, 1 1 X, 



where X is a 7 x 7 matrix containing the constants. Let D be the 7x7 
diagonal matrix with the eigenvalues of Q, Ai, A2, • • • , A7, being the diagonal 
entries, and let V be the 7x7 matrix with the eigenvector v; corresponding 
to Aj being column i of V . Then Q = VDV~1, and we get 



p{t) = p(0) exp(Qt) 

= p(o)exp(yL>y-^t) 

= p{0)V ex.p{Dt)V-^ 

g-5t 

„-4t 



p(0)X^ 
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Thus the set of linear equations we need to solve reduces to 
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X = p(o)y 
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Doing this with p(0) = (1, 0, 0, 0, 0, 0, 0), we get 
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(4) 



From this representation of p(t) we immediately get the following Lemma, 
which will be usefull in the process of proving Theorem [T| 



Lemma 3 For all t > and p(0) = (1, 0, 0, 0, 0, 0, 0) 
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Similarly if p(0) = (0, 0, 0, 0, 0, 1, 0) we get 
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and the following Lemma follows immediately: 
Lemma 4 For all t > and p(0) = (0, 0, 0, 0, 0, 1, 0) 
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4.2 Proof of part ('ij 

Let T be a four-taxon tree over the set of taxa X = {a, b, c, *} with topology 
{ab; c; *) (here * refers to the fourth taxon, the identity of which plays no role 
when we come to consider the topology of the triple a, b, c), let A = {a, b, c}, 
and let a = o"i , (T2 , . . . , cj^ be a random sequence of transfer events on T 
generated by the standard LGT model in which the rate of transfer events 
from point p to point p' is A. Let ^ be any one of the three events ^ab\cj Cac|fe 
or ^6c|a) &iid let J denote the (stochastic) number of ^-joining transfers 
between time t = and time t = t-2. Then by the law of total probability 



j>o) + p(e,J = o) 

J > 0)P( J > 0) + 



J = 0)P(J = 0). 



(6) 



10 



To find P( J > 0) and P( J = 0) we observe that J has a Poisson distribution 
with mean 2At2, since at any moment in the interval [0,t2]) there are four 
hneages, three of which lead to leaves in A, and for each of these, the rate 
of transfer from that 74-lineage to another ^-lineage is A • 2/3. This means 
that the cumulative rate of an A-joining transfer is 3 • A • 2/3 = 2A at any 
given time in the interval [0, ^2]- Thus 

P( J = 0) = e"2^*2 and P( J > 0) = 1 - e"2^*2^ 

and we arrive at 

P(0 = P(C| J > 0)(1 - e"2^*2) + p(^| J ^ o)e"2^*2^ (7) 

where ^ is any one of the events iab\ci iac\b or £,bc\a- We will now consider 
the two factors P(^| J > 0) and P(^| J = 0) in turn. 

P(^| J^ > 0): Lemma l|2b tells us that if there is at least one A-joining 



transfer in a, then the first one of these decides the resulting topology of 
r[fT]|^. There are 6 possibilities for this first A-joining transfer: a — ?■ 6, 
a<— 6, a— 7>c, o-^c, &— )-c and h ^ c. The first two of these will give 
^[o;]|^ = ah\c, the next two will give r[a]|j4 = ac\h, while the last two will 
give T[a;]|^ = 6c|a. As they are all equally likely, we get 

P(^|J>0) = ^. (8) 



P(^| J = 0): When J = 0, Lemma l|2c tells us that we need to look 



m 

at the ^-moving transfers, and Lemma |2| tells us that T^, as described in 
the preamble of the two lemmas, will induce the same topology on A as 
T. The process of A-moving transfers between time t = and t = ^2 is a 
Poisson process in which the rate at which any given x ^ Ais moved is 3 A, 
since each of the three A-lineages can be moved to only one (*) out of three 
other lineages (otherwise it would be an 74-joining transfer). Note that this 
process is independent of J as the source point of an 74-joining transfer will 
always have an element of A as a descendant, whereas the source point of an 
A-moving transfer will not. The walk in tree space, corresponding to moving 
along the sequence Tq , T{ , . . . , T^ , as this process proceeds is described by 
the Markov chain illustrated in Fig. 3 with rate 3 A of moving from any state 
to each of its neighbors. 

Now let Zt : t > be a continuous-time symmetric random walk on 
the 12-cycle illustrated in Fig. [3j where the instantaneous rate of moving 
from one node to one of its neighbors is 3 A. As T has topology {ab]c;*) 
the random walk's initial state is state 1. The 7-state model, treated in 
the previous section, is obtained from the model in Fig. [3] by grouping state 
2 and 12, 3 and 11, 4 and 10, 5 and 9, and 6 and 8. So, accordingly, let 
Prit) for r = 0, 1, ... ,6 be the probability that, after running this process 
for time t, Zt is at a state that can be reached in r steps from state 1, taking 
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© 




a b * c (q 



® 

Figure 3: The 12-cycle describing the walk in tree space corresponding to 
moving along the sequence Tq, T{, . . . , T^. 

no diagonal edges, and let p(t) = {po{t),piit),p2it),p3{t),p4,{t),p5{t),pe{t)). 
Then p(0) = (1,0,0,0,0,0,0) and p(t) behaves as described in Lemma |3] 
after rescaling time by a factor gA. 

Let Tj be the state of the random walk on the 12-cycle at time ^2- 
Lemma [2] then ensures, that if a' is the sequence of A- moving transfers 
between t = and t = t2, then T[gf] resolves a, b and c in the same way as 
Ti does. At time t = t2 the random walk on the 12-cycle is in one of the 
following states: 

• 1 in which case r[c7] |^ = ab\c with probability 1 regardless of any LGT 
events after t2- 

• 2 or 12 in which case 
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— T[cj]|^ = ab\c with probability i if there is at least one transfer 
event between t2 and ^3, and 

— T[cj]|^ = ac\b and T[o"]|A = bc\a both have probability 
2 if there is no LGT events between ^2 and t^, and 
o if there is at least one transfer between ^2 and ^3 . 



* 2 



• 3 or 11 in which case T[(T]|yl = ac\b and T[o"]|^ = bc\a both have 
probability 2 regardless of any LGT events after ^2- 

• 4 or 10 in which case 

— T[cj]|^ = ab\c with probability 3 if there is at least one transfer 
event between t2 and ^3, and 

— T[cj]|^ = ac\b and T[o"]|A = bc\a both have probability 

* 2 if there is no LGT events between ^2 and ts, and 

* 3 if there is at least one transfer between ^2 and ^3 . 

• 5 or 9 in which case T[(7]|y4 = ac\b and T[o"]|A = bc\a both have 
probability ^ regardless of any LGT events after t2- 

• 6 or 8 in which case 

— T[cj]|^ = a6|c with probability 

* 1 if there is no LGT events between t2 and ^3, or 

* 3 if there is at least one transfer event between ^2 and ^3, and 

— T[cj]|^ = ac|6 and T[cj]|^ = bc\a both have probability 3 if there 
is at least one transfer between ^2 and t^. 

• 7 in which case T[cj] |^ = ab\c with probability 1 regardless of any LGT 
events after t2- 

Let fi = nXt2 and B = 3X{t^ — t2). Then the probability that there is no 
LGT event between t2 and ^3 is e~ , and the probability that there is at 
least one LGT event in the same time span is thus 1 — e~^ . Consequently 
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we get the following from combining the cases above: 



mab\c\J = 0) = P0{f^) ■ I 
+ P2(^)-0 
+ P4(^)-0 

+ P5(^) •(: 

= Po{fJ') +P6(^)+P5(/^)e"'^ + -(l-e"-^)(pi(^)+p3(/i)+p; 

(9) 

P(eac|6|-/ = 0) = mbc\a\J = 0) = ^ (p2(/i) + P4(/i)) + ^e~''(pi(Ai) + Psif^)) 

'3{fJ')+P5{fJ'))- 

(10) 

Now using Lemma |3j we get 



+ pi(^).^(l-e-^) 



+ ^3(^)4(1-^"'') 



+ P5(^)-(l-e-^ + 3(l-e-^)) 



Similarly we get 



1 1 -B, 



^(1 - e-^)(pi(/x) +p3(/u) +P5(a^)). 



re-^)e-'^) 



IP(ea6|cl J = 0) = le-^>' + ^e-^/^ + ^e~^' + ^ 

+ (^e--ie--ie--le- + l)e- 

(11) 
and finally, using M and (Is]), we arrive at 

mab\c) = 1(1 + e-'^{\e-^e-'>^ + (1 - ^e-«)e-2^ + (1 - Je"^))). 

(12) 
From this it is easy to see that P(i^) > 3 for all positive values of ^ and 
B, as \e-^e-^^' > 0, (1 - \e-^)e-'^^' > and 1 - \e-^ > 0. Hence since 

nLc\b) = nibc\a) we get nu\c) > \ > niac\b) = nibc\a)- 

A plot of P(Cab|c) s-s a function of /i and i? is shown in Fig. [4l 

4.3 Proof of part (ii) 

The proof of the claim in part (ii) of Theorem [I] is completely analogous 
to the proof of part (i) up until the formulation of IP(Ca|6c) in ^- When 
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Figure 4: F{^ab\c) ^or four-taxon trees with the {ab; c; *) topology as a func- 
tion oi B = 3A(t3 — t2) and /i = |At2- 



the original tree T has topology {ab; *; c), the random walk starts in state 7 
(not state 1 as it did before). Because of the symmetries of the two Markov 
models, this means that po{iJ.) and peifJ-) swap places in ^, pi(/u) and P5(/i) 
swap places, and p2(^) and P4,{n) swap places. Consequently we get 



ab\c\J = 0) =PoifJ.)+P6ifJ.)+Pl{n)e +3(1-6 )(Pi(A')+P3(^)+P5(/^)) 

and P(Cac|b) = ^i^bc\a)- Now using Lemma [s] we get 



nLb\c\J = 0) = ^(1 - le-"^e-^ + (1 - le-'')e-'^ + (1 + ^e"^)). 
And using ([T]) and ([s]) we arrive at 



5 

4' 



nUlc) = 1(1 - e~-^^{\e-'^e-^ - (1 - ^e'^je'^^' 



(1 + ^e-^))). 
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We will now show that ^{^ab\c) > k for all fi,B > 0. From the above we 
observe that lP(Ca6|c) > | if ^^"^ '^^^Y i^ 



> ^e-^'^e-^ - (1 - ^6-^)6-2/^ 



(l + ^e 



-B\ 



t 



4 2 4 



Now note that 



(13) 






for /i ^ 0, and 



le-^f' + ie-^A* 



for 



A* 



oo. 



Thus |e^ 



-4m + lg-2/. 



I < for all /i > 0. This means that (13) is true for 



all iJ,,B>0 since the left-hand side is always positive and the right-hand 
side is always negative. We conclude that lP(Cafe|c) > g foi" all fi, B > and 
hence for all values of t2 and t^ . 

A plot of IP('^a6|c) as a function of fi and i? is shown in Fig. l5J 
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Figure 5: IP(Ca6|c) foi' four-taxon trees with the {ab; *; c) topology as a func- 
tion oi B = 3A(t3 — t2) and /U = 3 At2- 
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4.4 Proof of part (iii) 

The proof of part (iii) of Theorem [I] is completely analogous to the proof 
of part (i) and (ii) given in Section 4.2 and 4.2 up until the computation of 
P(C|J = 0). 

As in the previous two sections let .^j : i > be a continuous-time 
symmetric random walk on the 12-cycle illustrated in Fig. [3| where the 
instantaneous rate of moving from one node to one of its neighbors is 3 A. 
But this time, as T has topology (a*; b; c) or (6*; a; c), the random walk starts 



in either state 6 or 8. Let Po{t),Piit), . . . ,pQ{t) be defined as in Section 4.2 
by 

poit) = F{Zt = 1) 
pi{t) = F{Zt = 2 ov Zt = 12) 
P2{t) = ¥{Zt = 3 or Zi = 11) 
p^{t) = ¥{Zt = 4 or Zi = 10) 
Pi{t) = ¥{Zt = 5 or Zi = 9) 
P5(t) = HZt = 6 or Zi = 8) 
P6(t) = nZt = 7), 

and let p(t) = {pQ{t),pi{t),p2{t),pz{t),pi{t),p^{t),p(i{t)). Then p(0) = 
(0,0,0,0,0,1,0) and p(t) behaves as d escribed in Lemma |4] after rescaling 
time by a factor ^A. As in Section 4.2 we get 



mah\c\J = 0) = po(/u) +P6(^) + 3(1 - e"^)(Pi(/") +:P3(^) +P5(/u)) e~ V(/i) 
and P(Cac|6 = IPC^fecIa)- Using Lemma [4] we now get 

^c\J = 0) = ^(1 + \e-^e-^>^ -{\- \e-^)e-'^ + {\ + ^^e~^)e~n, 



3' 8 '2 4' '2 

and we finally arrive at 

nU\c) = \{l + e-'^{\e-^e-'^ -{\- \e-^)e-'^ + {\ + ^e"^))). 

To show that ^{iab\c) > 5 for all ^, B > we observe that f{^ab\c) > 5 if 
and only if 

< le-^e-'^ -il- \e-^)e-'^ + {'- + y e-^)e-'^ 
t (14) 
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Note that 



-4/. 



+ 



-2ii 



+ 



11 



for 



/^ 



0, and 






2^ 



+ 



11 



Thus fe-^^ + ie" 



2/^ 



11 

8 



for 



/^ 



oo. 



+ 4^ > for all // > 0. This means that (14) is true 



for all /i, i? > 0, as the left-hand side is always negative, and the right-hand 
side is always positive. We therefore conclude that IP(^a6|c) > 5 > ^{iac\b) = 
.|a) for all values of t2 and ^3. 
A plot of IP(^a(,|c) as a function of /i and B is shown in Fig. pi 
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Figure 6: 



ah\c 



) for four-taxon trees with the either the {a*]h]c) or the 



(6*; a; c) topology as a function oi B = 3A(t3 — ^2) and ^u 



3-^*2- 



5 Limits 



It is interesting and reaffirming to study the limits of the probabilities stated 
in Theorem [1] when t2-, ta or ^3 — ^2 approaches 0. When ^3 approaches 
we leave no time for any transfer events to occur before the first three taxa 
have coalesced. We therefore expect to see that the triplet topology from the 
species tree is preserved. And indeed IP(Ca6|c) ~^ 1 and ^{iac\h) = ^[ibc\a) ~^ 
as ta — )• in all three cases of Theorem [I] (topology {ah\ c; *), (a6; *; c) and 
(a*; 6; c) or (6*; a; c)). 
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When ^2 approaches we leave no time for any transfer events, sphtting 
up the two most closely related taxa, to occur, since such events would have 
to happen before time ^2 ■ Thus we expect that the grouping of these two is 
preserved from the species tree topology. We can recognize this behaviour in 
the limits for the topologies {ab;c;*) and (a6;*;c), where ^{S,ab\c) ~^ 1 ^-nd 
^i^ac\b) — ^{^bc\a) — ?■ as ^2 approaches 0. Taxa a and b are here invariably 
grouped together, and the matching topology is preserved regardless of any 
transfer events after time ^2- If the species tree has topology {a*;b;c) or 
(6*; a; c) then the species tree's triplet topology is preserved with probability 
1 if no transfer events occur between t2 and ts and with probability ^ if at 
least one transfer event occurs between t2 and ts (such an event would be 
an ^-joining event). Similarly a mismatching topology can only be obtained 
if at least one transfer event occurs between t2 and t^, in which case either 
of the two topologies is obtained with probability |. Indeed we see from 
Theorem [I| that F{U\c) ^ e"^ + ^(1 - e"^) and F{^,,\f,) = n^bc\a) ^ 
|(1 — e~'^as ^2 approaches 0. 

When t3 — ^2 approaches the triplet topology in a gene tree entirely 
depends on the transfer events taking place before time t2- But since any 
kind of events can happen in this period of time, this case is more complex 
than the two previous cases. The limits obtained from the probabilities in 
Theorem [T] are as follows: 

• If T has topology {ab; c; *) then 



mab\c) ^ ;.(! + e-'^{- + -e-'^ + -e-'^) > ;. V/. > 0, and 



^(l + e-^(^ + ^e-^ + le-V))>^ 
'ac\b) = mbc\a) ^ J(l - ^(^ + le-'' + le-"^)) < J Vm > 0. 
If T has topology {ab; *; c) then 
nLb\c) ^ ^(1 + e-'^(^ - le-^'' + le-^n) > ^ V/i > 0, and 



3^ 2 ^4 4 ^2 



:ac\b) = i^[Uc\a) ^ o a - ^(7 - 76 ^^ + T^e 2^)) < ^V ^ > 0. 



If T has topology (a*; b; c) or (6*; a; c) then 

mab\c) ^ 3(1 + e-^^(y + le-'^ - \e-'n) > 3 V/i > 0, and 



3^ 2 ^8 ^8 4 ^^ 3 



c\b) = mbc\a) ^ 0(1 - V(^ + ^^~'' - -A^'^')) < TT Va. > 0. 



It is interesting to note that the probability of the triplet topology matching 
the species tree always approaches a value strictly greater than | as ts — 12 
approaches 0. This is in contrast to more familiar stochastic processes in 
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phylogenetics - such as lineage sorting and site substitution models - where 

111 

3' 3' 3 



shrinking an interior branch length to zero results in a convergence to 



in support for the three resolutions of the resulting trifurcation. 
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